User-Specific Learning for Recognizing a Singer's Intended Pitch
نویسندگان
چکیده
We consider the problem of automatic vocal melody transcription: translating an audio recording of a sung melody into a musical score. While previous work has focused on finding the closest notes to the singer’s tracked pitch, we instead seek to recover the melody the singer intended to sing. Often, the melody a singer intended to sing differs from what they actually sang; our hypothesis is that this occurs in a singer-specific way. For example, a given singer may often be flat in certain parts of her range, or another may have difficulty with certain intervals. We thus pursue methods for singer-specific training which use learning to combine different methods for pitch prediction. In our experiments with human subjects, we show that via a short training procedure we can learn a singer-specific pitch predictor and significantly improve transcription of intended pitch over other methods. For an average user, our method gives a 20 to 30 percent reduction in pitch classification errors with respect to a baseline method which is comparable to commercial voice transcription tools. For some users, we achieve even more dramatic reductions. Our best results come from a combination of singer-specific-learning with non-singer-specific feature selection. We also discuss the implications of our work for training more general control signals. We make our experimental data available to allow others to replicate or extend
منابع مشابه
Trust Classification in Social Networks Using Combined Machine Learning Algorithms and Fuzzy Logic
Social networks have become the main infrastructure of today’s daily activities of people during the last decade. In these networks, users interact with each other, share their interests on resources and present their opinions about these resources or spread their information. Since each user has a limited knowledge of other users and most of them are anonymous, the trust factor plays an import...
متن کاملPerfect Tracking of Supercavitating Non-minimum Phase Vehicles Using a New Robust and Adaptive Parameter-optimal Iterative Learning Control
In this manuscript, a new method is proposed to provide a perfect tracking of the supercavitation system based on a new two-state model. The tracking of the pitch rate and angle of attack for fin and cavitator input is of the aim. The pitch rate of the supercavitation with respect to fin angle is found as a non-minimum phase behavior. This effect reduces the speed of command pitch rate. Control...
متن کاملLearning Pitch with STDP: A Computational Model of Place and Temporal Pitch Perception Using Spiking Neural Networks
Pitch perception is important for understanding speech prosody, music perception, recognizing tones in tonal languages, and perceiving speech in noisy environments. The two principal pitch perception theories consider the place of maximum neural excitation along the auditory nerve and the temporal pattern of the auditory neurons' action potentials (spikes) as pitch cues. This paper describes a ...
متن کاملChromagram visualization of the singing voice
The chromagram is a transformation of a signal's time-frequency properties into a temporally varying precursor of pitch. This transformation is based on perceptual observations concerning the auditory system and has been shown to possess several interesting mathematical properties. We illustrate the use of t~e chr?magram as a fast and robust method for visualizing attributes of a singer's voice...
متن کاملThe Singer's Formant and Speaker's Ring Resonance: A Long-Term Average Spectrum Analysis
OBJECTIVES We previously showed that a trained tenor's voice has the conventional singer's formant at the region of 3 kHz and another energy peak at 8-9 kHz. Singers in other operatic voice ranges are assumed to have the same peak in their singing and speaking voice. However, to date, no specific measurement of this has been made. METHODS Tenors, baritones, sopranos and mezzo sopranos were ch...
متن کامل